Refer-iTTS: A System for Referring in Spoken Installments to Objects in Real-World Images
نویسندگان
چکیده
Commonly, the output of a referring expression generation system is written text which is, typically, presented to a human user as a one-shot expression. Consequently, the majority of existing REG systems interact with users in a very rigid and strictly turnbased fashion: only after the system has fully completed and delivered the result of the REG process, the user is able to read it and react accordingly. A lot of human referential communication, however, happens in situated interaction and via spoken language. Theoretically, it is well known that this change in modality fundamentally changes human production of referring expressions: Given the real-time constraints of situated interaction, a speaker often has to start uttering before she has found the optimal expression, but at the same time, she can observe the listener’s reaction while speaking and extend, adapt, or correct her referring expressions accordingly (Clark and Wilkes-Gibbs, 1986; Clark and Krych, 2004). Practically, spoken and interactive REG has been rarely studied empirically or implemented in realistic systems, but see (DeVault et al., 2005; Staudte et al., 2012; Striegnitz et al., 2012; Fang et al., 2014).
منابع مشابه
Referring in Installments: A Corpus Study of Spoken Object References in an Interactive Virtual Environment
Commonly, the result of referring expression generation algorithms is a single noun phrase. In interactive settings with a shared workspace, however, human dialog partners often split referring expressions into installments that adapt to changes in the context and to actions of their partners. We present a corpus of human–human interactions in the GIVE-2 setting in which instructions are spoken...
متن کاملEasy Things First: Installments Improve Referring Expression Generation for Objects in Photographs
Research on generating referring expressions has so far mostly focussed on “oneshot reference”, where the aim is to generate a single, discriminating expression. In interactive settings, however, it is not uncommon for reference to be established in “installments”, where referring information is offered piecewise until success has been confirmed. We show that this strategy can also be advantage...
متن کاملSegmentation Improvement of High Resolution Remote Sensing Images based on superpixels using Edge-based SLIC algorithm (E-SLIC)
The segmentation of high resolution remote sensing images is one of the most important analyses that play a significant role in the maximal and exact extraction of information. There are different types of segmentation methods among which using superpixels is one of the most important ones. Several methods have been proposed for extracting superpixels. Among the most successful ones, we can r...
متن کاملOn-Line Learning of a Persian Spoken Dialogue System Using Real Training Data
The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...
متن کاملOn-Line Learning of a Persian Spoken Dialogue System Using Real Training Data
The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...
متن کامل